Pandas is powerful and easy-to-use library for data analysis. Is has two main object to represents data: Series and DataFrame.
Finding Help:
NumPyBase N-dimensional array package |
SciPyFundamental library for scientific computing |
MatplotlibComprehensive 2D Plotting |
|||
IPythonEnhanced Interactive Console |
SymPySymbolic mathematics |
PandasData structures & analysis |
In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
Series is an array like object.
In [2]:
x = pd.Series([1,2,3,4,5])
x
Out[2]:
Notice that generated an index for your item
In [3]:
x + 100
Out[3]:
In [4]:
(x ** 2) + 100
Out[4]:
In [5]:
x > 2
Out[5]:
In [6]:
larger_than_2 = x > 2
larger_than_2
Out[6]:
In [7]:
larger_than_2.any()
Out[7]:
In [8]:
larger_than_2.all()
Out[8]:
In [9]:
def f(x):
if x % 2 == 0:
return x * 2
else:
return x * 3
x.apply(f)
Out[9]:
Avoid looping over your data
This is a %%timeit
results from apply()
and a for loop.
In [10]:
%%timeit
ds = pd.Series(range(10000))
for counter in range(len(ds)):
ds[counter] = f(ds[counter])
In [11]:
%%timeit
ds = pd.Series(range(10000))
ds = ds.apply(f)
In [12]:
x.astype(np.float64)
Out[12]:
In [13]:
y = x
In [14]:
y[0]
Out[14]:
In [15]:
y[0] = 100
In [16]:
y
Out[16]:
In [17]:
x
Out[17]:
Avoid using copy (is you can) to save memory
In [18]:
y = x.copy()
In [19]:
x[0]=1
In [20]:
x
Out[20]:
In [21]:
y
Out[21]:
In [22]:
x.describe(percentile_width=50)
Out[22]:
In [23]:
data = [1,2,3,4,5,6,7,8,9]
df = pd.DataFrame(data, columns=["x"])
In [24]:
df
Out[24]:
In [25]:
df["x"]
Out[25]:
In [26]:
df["x"][0]
Out[26]:
In [27]:
df["x_plus_2"] = df["x"] + 2
df
Out[27]:
In [28]:
df["x_square"] = df["x"] ** 2
df["x_factorial"] = df["x"].apply(np.math.factorial)
df
Out[28]:
In [29]:
df["is_even"] = df["x"] % 2
df
Out[29]:
In [30]:
df["odd_even"] = df["is_even"].map({1:"odd", 0:"even"})
df
Out[30]:
In [31]:
df = df.drop("is_even", 1)
df
Out[31]:
In [32]:
df[["x", "odd_even"]]
Out[32]:
In [33]:
pd.options.display.max_columns= 60
pd.options.display.max_rows= 6
pd.options.display.notebook_repr_html = False
df
Out[33]:
In [34]:
df[df["odd_even"] == "odd"]
Out[34]:
In [35]:
df[df.odd_even == "even"]
Out[35]:
In [36]:
df[(df.odd_even == "even") | (df.x_square < 20)]
Out[36]:
In [37]:
df[(df.odd_even == "even") & (df.x_square < 20)]
Out[37]:
In [38]:
df[(df.odd_even == "even") & (df.x_square < 20)]["x_plus_2"][:1]
Out[38]:
In [39]:
pd.scatter_matrix(df, diagonal="kde", figsize=(10,10));
In [40]:
df.describe()
Out[40]:
In [41]:
url = "http://www.google.com/finance/historical?q=TADAWUL:TASI&output=csv"
stocks_data = pd.read_csv(url)
In [42]:
stocks_data
Out[42]:
In [43]:
stocks_data["change_amount"] = stocks_data["Close"] - stocks_data["Open"]
stocks_data["change_percentage"] = stocks_data["change_amount"] / stocks_data["Close"]
stocks_data
Out[43]: